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This paper deals with chain graphs under the classic Lauritzen- 
Wermuth-Frydenberg interpretation. We prove that the regular Gaus- 
sian distributions that factorize with respect to a chain graph G with 
d parameters have positive Lebesgue measure with respect to R d , 
whereas those that factorize with respect to G but are not faithful 
to it have zero Lebesgue measure with respect to R d . This means 
that, in the measure-theoretic sense described, almost all the regular 
Gaussian distributions that factorize with respect to G are faithful 
to it. 



1. Introduction. This paper deals with chain graphs under the clas- 
sic Lauritzen-Wermuth-Frydenberg interpretation. We prove that the reg- 
ular Gaussian distributions that factorize with respect to a chain graph 
G with d parameters have positive Lebesgue measure with respect to M. d , 
whereas those that factorize with respect to G but are not faithful to it 
have zero Lebesgue measure with respect to M. d . This means that, in the 
measure-theoretic sense described, almost all the regular Gaussian distri- 
butions that factorize with respect to G are faithful to it. Previously, it 
has been proven that for any undirected graph there exists a regular Gaus- 
sian distribution that is faithful to it (Lnenicka & Matiis, 2007, Corollary 
3). A stronger result has been proven for acyclic directed graphs: In cer- 
tain measure-theoretic sense, almost all the regular Gaussian distributions 
that factorize with respect to an acyclic directed graph are faithful to it 
(Spirtes et al., 1993, Theorem 3.2). Therefore, this paper extends the lat- 
ter result to chain graphs. It is worth mentioning that we have recently 
proved in (Peha, 2009) a result analogous to the one in this paper but for 
strictly positive discrete probability distributions with arbitrary prescribed 
sample space. It is also worth noticing that a result analogous to the one in 
this paper has been proven in (Levitz et al., 2001, Theorem 6.1) under the 
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alternative Andersson-Madigan-Perlman interpretation of chain graphs. 
There are two important implications of the result proven in this paper: 

• The use of chain graphs to represent independence models in artificial 
intelligence and statistics has increased over the years, particularly 
in the case of undirected graphs and acyclic directed graphs. 1 How- 
ever, there are independence models that can be represented exactly 
by chain graphs but that cannot be represented exactly by undirected 
graphs or acyclic directed graphs. As a matter of fact, the experi- 
mental results in (Peha, 2007) suggest that this may be the case for 
the vast majority of independence models that can be represented ex- 
actly by chain graphs. In other words, for most chain graphs, every 
undirected graph and acyclic directed graph either represents some 
separation statement that is not represented by the chain graph or 
does not represent some separation statement that is represented by 
the chain graph. As Studeny (2005, Section 1.1) points out, some- 
thing that would confirm that this is an advantage of chain graphs for 
modeling regular Gaussian distributions would be proving that any 
independence model represented by a chain graph can be represented 
by a regular Gaussian distribution. The result in this paper confirms 
this point. 

• In the literature, there exist two graphical criteria for identifying inde- 
pendencies holding in a probability distribution p that factorizes with 
respect to a chain graph G: The moralization criterion (Lauritzen, 
1996) and the c-separation criterion (Studeny, 1998). Both criteria are 
known to be equivalent (Studeny, 1998, Lemma 5.1). Furthermore, 
both criteria are known to be sound, i.e. they only identify indepen- 
dencies in p (Lauritzen, 1996, Theorems 3.34 and 3.36). The result 
in this paper implies that both criteria are also complete for regular 
Gaussian distributions: If p is a regular Gaussian distribution, then 
both criteria identify all the independencies in p that can be identi- 
fied on the sole basis of G, because there exists a regular Gaussian 
distribution that is faithful to G. 

The rest of the paper is organized as follows. We start by reviewing some 
concepts in Section 2. In Section 3, we describe how we parameterize the 
regular Gaussian distributions that factorize with respect to a chain graph. 
We present our results on faithfulness in Section 4. In Section 5, we present 
some results about chain graph equivalence that follow from the results in 
Section 4. Finally, we close with some discussion in Section 6. 



In this paper, we do not consider graphs with multiple edges between two nodes. 
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2. Preliminaries. In this section, we define some concepts used later 
in this paper. We first recall some definitions from probabilistic graphical 
models. See, for instance, (Lauritzen, 1996) and (Studeny, 2005) for further 
information. Let V = {1, . . . , N} be a finite set of size N. The elements of V 
are not distinguished from singletons and the union of the sets I\ , . . . , // C V 
is written as the juxtaposition I\ . . . //. We denote by |/| the size or cardi- 
nality of a set / C V, e.g. \V\ = N. We assume throughout the paper that 
the union of sets precedes the set difference when evaluating an expression. 
Unless otherwise stated, all the graphs in this paper are defined over V. 

If a graph G contains an undirected (resp. directed) edge between two 
nodes v\ and V2, then we write that v\ — V2 (resp. v\ -4 V2) is in G. If 
v\ — > i>2 is in G then v\ is called a parent of V2- Let Pclg(I) denote the set 
of parents in G of the nodes in I QV. When G is evident from the context, 
we drop the G from Pac(I) and use Pa(I) instead. A route from a node v± 
to a node v / in a graph G is a sequence of nodes V\ , . . . , v\ such that there 
exists an edge in G between V{ and ^j+i for all 1 < i < I. The length of a 
route is the number of (not necessarily distinct) edges in the route, e.g. the 
length of the route v\, . . . ,V[ is / — 1. We treat all singletons as routes of 
length zero. A path is a route in which the nodes v\, . . . ,vi are distinct. A 
route is called undirected if vi — v j+i is in G for all 1 < i < I. A route is 
called descending if vi — Vi+i or vi — > v j+i is in G for all 1 < i < /. If there 
is a descending route from v± to V[ in G, then v 1 is called an ancestor of v 1 
and vi is called a descendant of v\. Let Anc(I) denote the set of ancestors 
in G of the nodes in / C V. A descending route v\, . . . , vi is called a directed 
pseudocycle if Vi — > Vi + \ is in G for some 1 < i < I, and v\ = v\. A chain 
graph (CG) is a graph (possibly) containing both undirected and directed 
edges and no directed pseudocycles. An undirected graph (UG) is a CG 
containing only undirected edges. The underlying UG of a CG is the UG 
resulting from replacing the directed edges in the CG by undirected edges. 
A set of nodes of a CG is connected if there exists an undirected route in 
the CG between every pair of nodes in the set. A connectivity component 
of a CG is a connected set that is maximal with respect to set inclusion. 
Hereinafter, we assume that the connectivity components B\, . . . , B n of a 
CG G are well-ordered, i.e. if v± — > V2 is in G then v\ € B{ and i>2 G Bj 
for some 1 < i < j < n. The moral graph of a CG G, denoted G m , is the 
undirected graph where two nodes are adjacent iff they are adjacent in G 
or they are both in Pa(Bi) for some connectivity component Bi of G. The 
subgraph of G induced by / C V, denoted Gj, is the graph over / where 
two nodes are connected by a (un)directed edge if that edge is in G. A path 
v\ , . . . , v 1 in G is called a complex if the subgraph of G induced by the set of 
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nodes in the path looks like v\ — > v^ — ■ ■ .— uj-i <— v\. The path i>2, . . . , vi-\ is 
called the region of the complex. A section of a route p in a CG is a maximal 
subroute of p that only contains undirected edges. A section V2 — ■ ■ ■ — uj-i 
of p is a collider section of p if — >• v% — . . . — v\-\ is a subroute of 

p. Furthermore, a route p in a CG is said to be superactive with respect to 
K C V when 

• every collider section of p has some node in K, and 

• every other section of p has no node in K. 

A set / C V is complete in an UG G if there is an undirected edge in G 
between every pair of distinct nodes in I. We denote the set of complete sets 
in G by C(G). We treat all singletons as complete sets and, thus, they are 
included in C(G). 

Let X = (Xi)i<zy denote a column vector of random variables and Xj 
(I C V) its subvector (Xj)j e /. We use upper-case letters to denote random 
variables and the same letters in lower-case to denote their states. Unless 
otherwise stated, all the probability distributions in this paper are defined 
on (state space) M. N . Let /, J and K denote three disjoint subsets of V. We 
denote by I -L p J\K that Xi is independent of Xj given Xk in a probability 
distribution p. Likewise, we denote by / A^qJ\K that / is separated from J 
given K in a CG G. Specifically, I J-gJ\K holds when there is no route in G 
from a node in / to a node in J that is superactive with respect to K. This 
is equivalent to say that I J-g-J\K holds when every path in (GAn G (iJK)) m 
from a node in / to a node in J has some node in K. The independence 
model represented by a CG G is the set of separation statements I ^qJ\K. 
We say that a probability distribution p is Markovian with respect to a CG 
G when I -L P J\K if I J_gJ\K f° r a U Ij J an d K disjoint subsets of V. We 
say that p is faithful to G when I _L P J\K iff I _LqJ\K for all /, J and K 
disjoint subsets of V. We denote by I JL p J\K and IJLqJ\K that I -L p J\K 
and IJ-gJ\K do not hold, respectively. 

We now recall some results from matrix theory. See, for instance, (Horn and Johnson, 
1985) for more information. Let A = (Aj j)ijeV denote a square matrix. 
Let Aj^j with I, J C V denote its submatrix (Aij)i^jj^j. The determi- 
nant of A can recursively be computed, for fixed i € V, as det(A) = 
Sjgy( — l)* +,7 ^«J^ e *(^\(ii))' where A^y) denotes the matrix produced by 
removing the row i and column j from A. If det(A) ^ then the in- 
verse of A can be computed as (A~ l )ij = (— det(A\^^) / det(A) for 
all i,j G V. We say that A is strictly diagonally dominant if abs(Ai^) > 
Ylfjev-j^i} a bs(Aij) for all i G V, where absQ denotes absolute value. A 
matrix A is Hermitian if it is equal to the matrix resulting from, first, trans- 
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posing A and, then, replacing each entry by its complex conjugate. Clearly, 
a real symmetric matrix is Hermitian. A real symmetric N x N matrix A is 
positive definite if y T Ay > for all non-zero y £ M N . 

Remark 2.1. Note that det(A) is a real polynomial in the entries of A, 
and that (A~ l )ij is then the restriction of a fraction of two real polynomials 
in the entries of A to the area where det(A) is non-zero. 

Finally, we recall some results about Gaussian distributions. We represent 
a Gaussian distribution as 7V(/i, £) where /i is its mean vector and £ its 
covariance matrix. We say that a Gaussian distribution Af(fJ,, £) is regular if 
£ is positive definite or, equivalently, invertible. In this paper, we often find 
more convenient to work with the inverse of the covariance matrix Q = £ , 
which is also known as the concentration matrix or precision matrix. Since 
£ = f2 _1 , we thus often write A/"(/z,0 _1 ) instead of Af(fi, £). Let /, J, K 
and L denote four disjoint subsets of V. Any regular Gaussian distribution 
p satisfies, among others, the following properties: 

• Symmetry I -L p J\K => J _L p I\K. 

• Decomposition I J_ p JL\K IJ- P J\K. 

• Intersection I L p J\KL A I ± P L\K J I ± p JL\K. 

• Weak transitivity I -L p J\K A I _L p J\Ku I -L p u\K V U-L P J\K with 
u E V \ UK. 

The following results have been proven in (Bishop, 2006, Sections 2.3.1, 
2.3.3). For the sake of completeness, Appendix A adapts the proofs to the 
notation used in this paper. Let / and J denote two disjoint subsets of V. Let 
p{xu) = A^(/x, O" 1 ) where Q is positive definite. Then, as shown in (Bishop, 
2006, Section 2.3.1) and in Appendix A, p(xj\xj) = N{5xi+^,e~ l ) where 8, 
7 and e are the following real matrices of dimensions, respectively, \J\ x |/|, 
|J| x 1 and |J| x |J|: 

(2.1) S = -(nj^Qjj, 

(2.2) 7 = hj + (fljjy^jjfn 
and 

(2.3) e = 

Let p(xi) = Af(a, /3 _1 ) and q(xj\xi) = M(5xj + 7, e -1 ) where 5, 7 and e are 
real matrices of dimensions, respectively, | J| x | J|, | J\ x 1 and \J\ x | J|, and f3 
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and e are positive definite. Then, as shown in (Bishop, 2006, Section 2.3.3) 
and in Appendix A, p(xi)q(xj\xj) is a Gaussian distribution M(X, A -1 ) over 



1 where 



(2.4) A 
and 

(2.5) A 
Moreover, p(xj)q(xj\xi) is regular because 

(2.6) A" 1 



a 

5a + 7 



f3 + 5 T e5 -5 T e 
-e8 e 



/T 1 p~ l 5 T 
5/3" 1 e- 1 + Sp- 1 ^ 

3. Parameterization of chain graphs. In this section, we describe 
how we parameterize the regular Gaussian distributions that factorize with 
respect to a CG. This is a key issue because our results about faithfulness 
are not only relative to the CG at hand and the measure considered, the 
Lebesgue measure, but also to the number of parameters of the regular 
Gaussian distributions that factorize with respect to the CG at hand. 

We say that a regular Gaussian distribution p factorizes with respect 
to a CG G with connectivity components B\, ... , B n if the following two 
conditions are met (Lauritzen, 1996, Proposition 3.30): 

Fl. p(x) = n?=iP( a; s i |a;pa(B i )) where 

F2. p(x B .p a{Bi) ) = UceC((G BiPa{Bt) ^) ^c( x c) where each ip l c (x c ) is anon- 
negative real function. 

Let M(G) denote the set of regular Gaussian distributions that factorize 
with respect to G. We parameterize each probability distribution p € M(G) 
with the following parameters: 

• The mean vector \i of p. 

• The submatrices Q l B . B , and £l B p a r B .^ of the precision matrix ST of 
p(x Bi Pa(Bi)} for all 1 < i < n. 

We warn the reader that if £1 denotes the precision matrix of p, then fi* is 

not ^s i Pa(B i ),BiPa(B i ) but ((^ _1 ) BiPaiB^^PaiB,))' 1 '• lt is worth mention- 
ing that an alternative parameterization of the probability distributions in 
M(G) is presented in (Wermuth, 1992). The main difference between our 
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parameterization and the alternative one is that we parameterize certain 
concentration matrices whereas they parameterize certain partial concen- 
tration matrices. However, both parameterizations are equivalent. We omit 
the details of the equivalence because they are irrelevant for our purpose. 
We stick to our parameterization simply because it is more convenient for 
the calculations performed later in this paper. 

Note that the values of some of the parameters in the parameterization 
introduced above are determined by the values of the rest of the parameters. 
Specifically, for all 1 < i < n, the following constraints apply: 

CI. (fig. Bi)j,k = (^Bi Bi)k,j f° r au Ji k £ Bi, because Qj fe = O^. ■ since Qi 1 
is symmetric. 

C2. (£l l B . s )j,k = f° r an J) k (z Bi such that j and k are not adjacent in G. 
To see it, note that j and k are not adjacent in {G B .p a ^ B .^) m . Conse- 
quently, any path between j and k in (G Bi p a ( B .)) m must pass through 
some node in Bi\jk or Pa(Bi). Then, j ±( GBiPa(Bi) - ) mk\B i Pa(B i )\jk, 
which implies j 1 p {x B .p <B . ) ) k \ B i Pa { B i) \ J k because p{x B . Pa ( B .)) is 
Markovian with respect to (G Bi Pa(Bi)) m due to the condition F2 above 
(Lauritzen, 1996, Proposition 3.30, Theorems 3.34 and 3.36). The lat- 
ter independence statement implies Vt % - k = and, thus, (f^. B .)j t k = 
(Lauritzen, 1996, Proposition 5.2). 

C3. (Q l B . p a ( B .))j,k = for all j S Bi and k G Pa(Bi) such that j and k 
are not adjacent in G, by a reasoning analogous to the one above. 

Hereinafter, the parameters whose values are not determined by the con- 
straints above are called non-determined (nd) parameters. However, the 
values the nd parameters can take are constrained by the fact that these 
values must correspond to some probability distribution in Af(G). We prove 
in Lemma 3.1 that this is equivalent to requiring that the nd parameters can 
only take real values such that £l B . B is positive definite for all 1 < i < n. 
That is why the set of nd parameter values satisfying this requirement are 
hereinafter called the nd parameter space for Af(G) . We do not work out the 
inequalities defining the nd parameter space because these are irrelevant for 
our purpose. The number of nd parameters is what we call the dimension 
of G, and we denote it as d. Specifically, d = 2\V\ + \G\ where \G\ is the 
number of edges in G: 

• \V\ due to fi. 

• \V\ due to {£l B . Bi)j,j f° r all 1 < i < n and j 6 B^. 

• |G| due to the entries below the diagonal of £l l B . B that are not iden- 
tically zero and the entries of Vt B Pa i B .\ that are not identically zero 
for all 1 < i < n. To see this, recall from the constraints C1-C3 above 
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that there is one entry below the diagonal in some Q B . B . that is not 
identically zero for each undirected edge in G, and one entry in some 
£l B . Pa f B .\ that is not identically zero for each directed edge in G. 

Lemma 3.1. Let G be a CG. There is a one-to-one correspondence be- 
tween the probability distributions in Af(G) and the elements of the nd pa- 
rameter space for J\f(G). 

Proof. We first prove that the mapping of probability distributions 
into nd parameter values is injective. Obviously, any probability distribu- 
tion in p £ A/"(G) is mapped into some real values of the nd parameters 
/j,, Q l B , B and Vt B Pa r B .} for all 1 < % < n. In particular, £l l B . B , takes value 

(((tt~ 1 )B l Pa(B l ),B l Pa(B l ))~ 1 )B l ,B l where Q is the precision matrix of p. Then, 
that Q, B . B . is positive definite follows from the fact that Q is positive def- 
inite (Studeny, 2005, p. 237). Thus, p is mapped into some element of the 
nd parameter space for M(G). 

Moreover, different probability distributions are mapped into different 
elements. To see it, assume to the contrary that there exist two distinct 
probability distributions p,p' € J\T(G) that are mapped into the same el- 
ement. Note that this element uniquely identifies p{xBi \xp a (Bi)) D Y Equa- 
tions 2.1-2.3 for all 1 < i < n, where / = Pa(Bi) and J = Bi. Likewise, it 
uniquely identifies p'{x Bl \xp a (Bi)) f° r all 1 < i < n. Then, p(x Bi \xp a (Bi)) = 
p'(xBi\xp a (Bi)) f° r all 1 < i < n. However, this contradicts the assumption 
that p and p' are distinct by the condition Fl above. 

We now prove in three steps that the mapping of nd parameter values 
into probability distributions is injective. 

Step 1 We first show that any element of the nd parameter space for 
ftf(G) is mapped into some regular Gaussian distribution q. Note that any 
element of the nd parameter space for Af(G) uniquely identifies a Gaussian 
distribution q l {x Bi\x p a (Bi)) f° r all 1 < i < n by Equations 2.1-2.3, where / = 
Pa(Bi) and J = B { . Specifically, q l (x Bl \x p a{Bl )) = M{5 l x Pa{Bi ) + f, (e 1 )' 1 ) 
where 

^ = ~(^ l B i ,B i ) l ^Bi,Pa(Bi)-> 

( 3 - 2 ) 7* = MB* + (^k,B l )~ lfi k,Pa(Bi)^ a ( B 

and 



(3.3) 
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In the equations above, we have assumed that the values of all the entries 
of Q l B . B , and Vt B Pa r B .\ have previously been determined from the element 
of the nd parameter space at hand and the constraints C1-C3 above. Fur- 
thermore, note that q l (xB i \xp a (B i )) is regular because, by definition, Q B . B . 
is positive definite. Clearly, q q, {xB i \xpa(B i )) can be rewritten as a regular 
Gaussian distribution r l (xB l \xB 1 ...B l - 1 )'- It suffices to take 

r i (x Bl \x Bl ...B^)=M((5\0)( Xpa ^ )+Y,(e i )- 1 ) 

V X B 1 ...B i -i\Pa(B i ) J 

where is a matrix of zeroes of dimension \B{\ x \B± . . . Bi_i\Pa(Bi)\. Then, 
r 1 (xB 1 )r 2 (xB 2 \xBi) is a regular Gaussian distribution by Equations 2.4-2.6. 
Likewise, r 1 (xB 1 )r 2 (xB 2 \xB 1 )r 3 (xB 3 \xB 1 B 2 ) is a regular Gaussian distribu- 
tion. Continuing with this process for the rest of connectivity components 
proves that niLi q l ( x B i \xp a (B i )) = EKLi r 1 (x Bi \ x B 1 ...B i - 1 ) is mapped into 
some regular Gaussian distribution q. 

Step 2 We now show that q G M(G). Note that for all 1 < i < n and any 
fixed value of XB^...Bi 

/n 
Yl q l {x Bl \xp a{Bl ))dxB t+1 ...B n 
l=i+l 

= j q l+1 {x Bi+1 \xpa{B l+1 ))[J q l+2 (x Bl+ . 2 \x Pa{Bl+2 ))[--- 

■■■[J q n (xB n \x Pa{ B n ))dx Bn ] • • .]dx Bl+2 ]dx Bl+1 = 1. 
Thus, for all 1 < i < n, it follows from the equation above that 

/n 
Yl q\x Bl \x Pa {B l ))dx Bl ...B n \B l Pa{B l ) 
1=1 

= [Ylq l (.XB l \x Pa{Bl ))}[ II 1 l ( X B l \xpa(B l ))dx Bl+1 ...B n ]dx Bl ...B l - 1 \Pa(B l ) 
J 1=1 J l=i+l 

= j flq l (xB l \xp a( B l ))dx Bl ...B^ 1 \Pa(B % ) 



1=1 



/i—1 
Y[q l (x Bl \xp a{ B l ))dx Bl ...B 1 - 1 \Pa(B l 
1=1 
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Moreover, for all 1 < i < n, it follows from the equation above that 

q( x Pa(Bi)) = J <l{xBiPa{Bi))dXBi 
[ J \\_q\xB l \xp a{ B l ))dx Bl ... Bi _ 1 \ Pa{Bi) \dxB l 



(3-5) = / [/ X\q l {x Bl \x Pa{Bl) )dx Bl ]d x B 1 ...B^ 1 \Pa(B l ) 

J J 1=1 

(3.6) = / [Y[q l (x Bl \x Pa ( Bl) ) / q l {x Bi \xp a{Bi) )dx Bi }dx Bl _ Bi l \p a{Bi) 
J i=i J 

/i—l 
Ylq l (x Bl \x Pam )dx Bl _ Btl \p aiBi) . 
i=i 

Note the use of Fubini's theorem to change the order of integration and 
produce Equation 3.5. This implies that the inner integral in Equation 3.6 
becomes 1. Consequently, for all 1 < i < n 

fo q\ ( i \ q( x BiPa(Bi)) if , , 
(3.8) q(x Bi \x Pa{Bi) ) = — = q (x Bi \xp a{Bi) ) 

q{Xpa{Bi)) 

due to Equations 3.4 and 3.7. Therefore, 

n n 

q(x) =~[[q t (x Bi \xp a{Bi) ) = Y[q(x Bi \xp a(Bi) ) 

i=l i=l 

and, thus, q satisfies the condition Fl above. Moreover, q(x Bi p a ( Bi ^) satisfies 
the condition F2 for all 1 < i < n. We show this by induction on i. Let A 1 
denote the precision matrix of q(x Bi p a ( Bi j), and note that 

q(x BiPa{Bi) ) = q l (x B% \xp a{Bi) )q(x Pa{B%) ) 

by Equation 3.8. So, A 1 can be calculated from q l {x Bi \x Pa r B .\) and q(xp a r B .\) 
via Equation 2.5. Specifically, it follows from Equations 2.5 and 3.3, respec- 
tively 3.1, that 

(3-9) A Bi , Bi = c* = W Bi>Bi 
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and 

(3.10) A-B it p a (Bi) = ~ e% ^ % = ~^ l B i ,B i [~(P'Bi,Bi) 1 ^ l B i ,Pa(B i )\ = ^Bj,Po(B»)- 

Consequently, due to the constraints C2 and C3 above, A* k = for all j, A; € 
BiPa(Bi) such that j and /c are not adjacent in {G B . Pa ^ B .^) m . Moreover, 
A* k = is equivalent to j J- q ^ XB p ^ B ^k\BiPa(Bi) \jk (Lauritzen, 1996, 
Proposition 5.2). This implies that q(x BiPa ^ B ^) factorizes with respect to 
(GBiPa(Bi)) m anci , thus, that it satisfies the condition F2 above (Lauritzen, 
1996, Proposition 3.30, Theorems 3.34 and 3.36). Consequently, q € M(G). 

Step 3 We finally show that different elements of the nd parameter space 
for AA(G) are mapped into different probability distributions in M(G). As- 
sume to the contrary that two distinct elements of the nd parameter space 
for Af(G) are mapped into the same probability distribution q £ M(G). As- 
sume that the two elements differ in the value for fi Bi , Q B . B _ or $7^. Pa ^ B .^ 

but that they coincide in the values for fi B[ , B[ and fi^ Pa(B t ) ^ or au 
1 < I < i. There are two scenarios to consider: 

• If the two elements differ in the value for f2 □ R or fi' D D , D n , then 

Bi,£>i Bi,Fa(Bi)i 

they are mapped into two different q{x BiPa ^ B ^) by Equations 3.9 and 
3.10, because two regular Gaussian distributions with different preci- 
sion matrices are different. However, this contradicts the assumption 
that the two elements are mapped into the same q. 

• If the two elements differ in the value for jjL Bi but they do not differ 
in the values for £l B . Bi and Q l B , Pa ^ B y then the two elements do not 
differ in the value for n Pa i Bi ) either, because Pa(Bi) C B\ . . . 
and we assumed above that the two elements coincide in the values 
for \i Bl for all 1 < I < i. Then, the two elements are mapped into the 
same 5 l but different j 1 in Equations 3.1 and 3.2. That is, the two 
elements are mapped into two different q l (x Bi \x Pa ( B .}) and, thus, to 
two different q(x Bi \x Pa ( B .j) by Equation 3.8. However, this contradicts 
the assumption that the two elements are mapped into the same q. 

□ 

Remark 3.1. Note the following three observations: 

• For all 1 < i < n, according to the constraints C1-C3 above, every 
entry of £l B . B . and Q? B Pa t B .\ is equal either to zero or to some nd 
parameter in the parameterization of the probability distributions in 
M{G). 
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• For all 1 < i <n, by Remark 2.1, every entry of (Jig. B .) 1 is a frac- 
tion of real polynomials in the entries of VL B . B . and, thus, a fraction 
of real polynomials in the nd parameters in the parameterization of the 
probability distributions inN(G). Thus, every entry of the matrices 5 l 
and e l in Equations 3.1 and 3.3 is also a fraction of real polynomials 
in the referred nd parameters. 

• Every entry of the precision matrix of r 1 (xB 1 )r 2 (xB 2 \xB 1 ) in the proof 
above is, by Equation 2.5, a real polynomial in the entries of 5 2 , e 2 
and the precision matrix o/r 1 (x^ 1 ), i.e. e 1 . Likewise, every entry of 
the precision matrix of r l {xB 1 )r 2 {xB2\xBi)'r' 3 {xB 3 \xB\B2) in the proof 
above is a real polynomial in the entries of 5 3 , e 3 and the precision 
matrix of r 1 (xB 1 )r 2 (xB 2 \xB 1 ), that is, a real polynomial in the entries 
of <5 3 , e 3 , 5 2 , e 2 and e . Continuing with this process for the rest of 
connectivity components shows that every entry of the precision matrix 
of q{x) = nr=i r% { x Bi |^Bi..._Bi_i) in the proof above is a real polynomial 
in the entries of the matrices e , and 5 l and e l for all 1 < i < n. 

It follows from the observations above that every entry of the precision 
matrix of q in the proof above is a fraction of real polynomials in the nd 
parameters in the parameterization of the probability distributions inJ\f{G). 
Consequently, by Remark 2.1, every entry of the covariance matrix of q is 
a fraction of real polynomials in the nd parameters in the parameterization 
of the probability distributions in Af(G). Moreover, note the following two 
observations on the latter fractions: 

• Each of these fractions is defined on the whole nd parameter space 
for N{G): The polynomial in the denominator of the fraction is non- 
vanishing in the nd parameter space for M(G) because, as we have 
proven in Step 1 in the theorem above, q is a Gaussian distribution. 

• Within the nd parameter space for M{G), each of these fractions van- 
ishes only in the points where the polynomial in the numerator of the 
fraction vanishes because, as we have just seen, the denominator of the 
fraction is non-vanishing in the nd parameter space for M{G) . 

We now prove another result that will be crucial in the coming section. 

Lemma 3.2. Let G be a CG of dimension d. The nd parameter space for 
M{G) has positive Lebesgue measure with respect to R. d . 

Proof. Since we do not know a closed- form expression of the nd pa- 
rameter space for J\f(G), we take an indirect approach to prove the lemma. 
Recall that, by definition, the nd parameter space for M{G) is the set of 
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real values such that, after the extension determined by the constraints CI 
and C2, Q B , B , is positive definite for all 1 < i < n. Therefore, all the nd 
parameters except those in £l B . B . for all 1 < i < n can take values indepen- 
dently of the rest of the nd parameters. The nd parameters in Vt B _ B cannot 
take values independently one of another because, otherwise, Vt B . B , may 
not be positive definite. However, if the entries in the diagonal of Q, B . B . 
take values in (\E>i\ — l,oo) and the rest of the nd parameters in Q B B . 
take values in [—1,1], then the nd parameters in Q l B , B _ can take values 
independently one of another. To see it, note that in this case Q B . B . will 
always be Hermitian, strictly diagonally dominant, and with strictly positive 
diagonal entries, which implies that £l l B . B . will always be positive definite 
(Horn and Johnson, 1985, Corollary 7.2.3). 

The subset of the nd parameter space of M(G) described in the paragraph 
above has positive volume in W 1 and, thus, it has positive Lebesgue measure 
with respect to M. d . Then, the nd parameter space of A/"(G?) has positive 
Lebesgue measure with respect to M. d . □ 

4. Faithfulness in chain graphs. The two theorems below are the 
main contribution of this manuscript. They prove that for any CG G, in the 
measure-theoretic sense described below, almost all the probability distri- 
butions in N(G) are faithful to G. 

Theorem 4.1. Let G be a CG of dimension d. M{G) has positive Lebesgue 
measure with respect to W d . 

Proof. The one-to-one correspondence proved in Lemma 3.1 enables 
us to compute the Lebesgue measure with respect to M. d of M(G) as the 
Lebesgue measure with respect to M rf of the nd parameter space for Af(G). 
Moreover, the latter is positive by Lemma 3.2. □ 

Before proving the second theorem, some auxiliary lemmas are proven. 

Lemma 4.1. Let G and H be two CGs such that the undirected (resp. 
directed) edges in H are a subset of the undirected (resp. directed) edges in 
G. Then, M(H) C J\f(G). 

Proof. Note that a regular Gaussian distribution factorizes with respect 
to a CG iff it is Markovian with respect to the CG (Lauritzen, 1996, Propo- 
sition 3.30, Theorems 3.34 and 3.36). Then, M(H) C Af(G) because the 
independence model represented by H is a superset of that represented by 
G. □ 
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Lemma 4.2. Let G be a CG such that 

1. G has a route between the nodes i and j that has no collider section, 
and 

2. the route has no node in Z QV \ ij. 

Then, there exists a probability distribution p 6 Af(G) such that ijL p j\Z. 

Proof. The route in the lemma can be converted into a path p between 
i and j in G as follows: Iteratively, remove from the route any subroute 
between a node and itself. Note that none of these removals produces a 
collider section: It suffices to note that if the route after the removal has 
a collider section, then the route before the removal must have a collider 
section, which is a contradiction. Consequently, p is a path between i and 
j in G that has no collider section. Therefore, p is superactive with respect 
to Z: Since the route in the lemma has no node in Z, p has no node in 
Z either. Now, remove from G all the edges that are not in p, and call the 
resulting CG H. Note that H has no complex since p has no collider section. 
Drop the direction of every edge in H and call the resulting UG L. Now, 
note that there exists a regular Gaussian distribution p that is faithful to L 
(Lnenicka & Matus, 2007, Corollary 3) and, thus, iJL p j\Z because iJL 
Note also that the fact that p is faithful to L implies that p is Markovian 
with respect to L which, in turn, implies that p is also Markovian with 
respect to H, because H and L have the same underlying UG and complexes 
(Frydenberg, 1990, Theorem 5.6). Consequently, p 6 N(H) (Lauritzen, 1996, 
Proposition 3.30, Theorems 3.34 and 3.36) and, thus, p 6 Af(G) because 
M{B) C M(G) by Lemma 4.1. □ 

Lemma 4.3. Let G be a CG. For every i,jeV and Z C V \ ij, there 
exists a real polynomial S(i,j,Z) in the nd parameters in the parameteriza- 
tion of the probability distributions in M(G) such that, for every p € M{G), 
i-L p j\Z iff S(i, j, Z) vanishes for the nd parameter values coding p. 

Proof. Let X denote the covariance matrix of p. Note that i-L p j\Z iff 
{{^ijz,ijz)~ l )i,j = (Lauritzen, 1996, Proposition 5.2). Recall that {{Tiijz,ijz) 
(-l) a det{T, iZ ,jz) /det(Eijz,ijz) with a G {0,1}. Note that det(T,ij Z ,ijz) > 
because Sijz.ijZ is positive definite (Studeny, 2005, p. 237). Then, i± p j\Z 
iff det(YiiZjz) = 0. Thus, i _L p j\Z iff a real polynomial R(i,j,Z) in the 
entries of X vanishes due to Remark 2.1. However, note that it follows from 
Lemma 3.1 and Remark 3.1 that each entry of X is a fraction of real poly- 
nomials in the nd parameters in the parameterization of the probability 
distributions in Af(G). Recall also from Remark 3.1 that the polynomial in 
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the denominator of each of these fractions is non-vanishing in the nd pa- 
rameter space for M(G). Therefore, by simple algebraic manipulation, the 
polynomial R(i,j,Z) can be expressed as a fraction S(i, j, Z)/T(i, j, Z) of 
real polynomials in the nd parameters where T(i,j,Z) is non- vanishing in 
the nd parameter space for J\f(G). Consequently, i_L p j|Z iff the real poly- 
nomial S(i,j, Z) in the nd parameters vanishes for the values coding p. □ 

We interpret the polynomial in the lemma above as a real function on a 
real Euclidean space that includes the nd parameter space for M{G). We say 
that the polynomial in the lemma above is non-trivial if not all the values 
of the nd parameters are solutions to the polynomial. This is equivalent to 
the requirement that the polynomial is not identically zero, because the nd 
parameter space for Af(G) contains a d-dimensional interval in M. d , where d 
is the dimension of G (recall the proof of Lemma 3.2). 

Let v denote an undirected route v 2 — . . . — vi-i in a CG. Hereinafter, we 
denote by v\ — > v 4— v\ the route v\ — > i>2 — ■ ■ ■ — <— V\. 

Lemma 4.4. Let G be a CG such that 

1. G has a route i — > v <— j where i,j€V and v is an undirected route, 
and 

2. some node in v is in Z or has a descendant in Z , where Z C V \ ij. 
Then, there exists a probability distribution p € M(G) such that iJL p j\Z . 

Proof. The route v can be converted into a path •& in G as follows: 
Iteratively, remove from v any subroute between a node and itself. Note 
that v does not contain either i or j because, otherwise, G would have a 
directed pseudocycle between i and itself or between j and itself, which 
is a contradiction. Therefore, 1? does not contain either i or j and, thus, 
i — > •& 4— j is a path in G. Note that the subroutes removed from v contain 
only undirected edges. Therefore, every node that is in v but not in 1? is a 
descendant of some node in Consequently, some node in $ is in Z or has 
a descendant in Z, due to the assumptions in the lemma. 

We first prove the lemma for the case where some node in 1? is in Z . 
Remove from G all the edges that are not in i — > 7? 4— j, and call the 
resulting CG H. Note that i — > $ i— j is a complex in H and, thus, that 
i-Ljij- Let k denote the closest node to i that it is in 1? and in Z. 

We prove in this paragraph that there exists a probability distribution 
p € J\f(H) such that % JL p jZ \ k\k. By Lemma 4.3, there exists a real 
polynomial S(i, k, 0) in the nd parameters in the parameterization of the 
probability distributions in N{H) such that, for every q G M(H), i _L q k 
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iff S(i, k, 0) vanishes for the nd parameter values coding q. Furthermore, 
S(i, k, 0) is non-trivial. To see this, remove from H all the edges outside the 
path between i and k, and call the resulting CG L. Note that N{L) C Af(H) 
by Lemma 4.1. Now note that, by Lemma 4.2, there exists a probability 
distribution r £ W(L) such that i JL r k. By an analogous reasoning, we 
can conclude that there exists a non-trivial real polynomial S(j, k, 0) in the 
nd parameters in the parameterization of the probability distributions in 
M{H) such that, for every q £ N(H), j _L q k iff S(j,k,$) vanishes for the 
nd parameter values coding q. Let sol(i, k, 0) and sol(j, k, 0) denote the sets 
of solutions to the polynomials S(i, k, 0) and S(j,k,$), respectively. Let d 
denote the dimension of H. Then, sol(i, k, 0) and sol(j, k, 0) have both zero 
Lebesgue measure with respect to R d because they consist of the solutions to 
non-trivial real polynomials in real variables (the nd parameters) (Okamoto, 
1973). Then, sol = sol(i,k,$) U sol(j,k,$) also has zero Lebesgue measure 
with respect to R d , because the finite union of sets of zero Lebesgue measure 
has zero Lebesgue measure too. Consequently, the probability distributions 
q £ N(H) such that i-L q k or j -Lqk correspond to a set of elements of the 
nd parameter space for M(H) that has zero Lebesgue measure with respect 
to M. d because it is contained in sol. Since this correspondence is one-to-one 
by Lemma 3.1, the probability distributions q £ N{H) such that i _L q k 
or j -Lqk also have zero Lebesgue measure with respect to M. d . This result 
together with Theorem 4.1 imply that there exists a probability distribution 
p £ N(H) such that iJL p k and j JL p k. Furthermore, as shown above i^nj 
and, thus, i-L p j because p is Markovian with respect to H, since p £ Af(H) 
(Lauritzen, 1996, Proposition 3.30, Theorems 3.34 and 3.36). Then, iJL p j\k 
by symmetry and weak transitivity and, thus, i ^L p jZ\k\k by decomposition. 

Finally, recall that k is the closest node to % that it is in $ and in Z, then 
i _L hZ \ k\jk and thus i _L P Z \ k\jk because p is Markovian with respect 
to H. Then, i JL p j\Z by intersection on i JL p jZ \ k\k and i _L P Z \ k\jk. 
Consequently, we have proven that there exists a probability distribution 
p £ M(H) such that iJL p j\Z. Moreover, p £ M(G) because M(H) C M(G) 
by Lemma 4.1. 

We now prove the lemma for the case where no node in $ is in Z but 
some node in $ has a descendant in Z . Consider the shortest descending 
path between a node in i? and a node in Z. Let / and k denote the initial 
and final nodes of the path, i.e. k € Z. Remove from G all the edges that are 
not in i — > $ j or in the path between I and k, and call the resulting CG 
H. Note that i — >■ $ <— j is a complex in H and, thus, that i-Ljjj. Therefore, 
we can follow the same steps as above to prove that there exists a probability 
distribution p £ M{H) such that i ^- p jZ \ k\k. Finally, recall that there is 
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no path between i and any node in Z \ k in H, then i_L#Z \ k\jk and thus 
i-LpZ \ k\jk because p is Markovian with respect to H. Then, i^L p j\Z by 
intersection on ijL p jZ\k\k and i-L p Z \ k\jk. Consequently, we have proven 
that there exists a probability distribution p G N(H) such that i JL p j\Z. 
Moreover, p G Af(G) because Af(H) C Af(G) by Lemma 4.1. □ 

Lemma 4.5. Let G be a CG such that i JL where i,j G V and 

Z C V \ Then, there exists a probability distribution p G A/"(G) such that 
iJL p j\Z. 

Proof. We prove the lemma in two steps. In the first step, we introduce 
some notation that we use in the second step, the actual proof of the lemma. 

Step 1 Given a route p in a CG H, we define H p as the CG resulting 
from removing from H all the edges that are not in p. We define the level of 
a node in H as the index of the connectivity component the node belongs 
to. We define the dlength of a route as the number of distinct edges in the 
route. Note the difference between the dlength and the length of a route: 
The former counts edges without repetition and the latter with repetition 
(recall Section 2). We say that a route is dshorter than another route if the 
former has smaller dlength than the latter. Likewise, we say that a route is 
dshortest if no other route is dshorter than it. Let denote any total order 
of the nodes in the CG H. Let 9^ denote any total order of all the routes 
between two nodes in H. Finally, i£ a JL nb\C where a, b G V and C C V \ab, 
then we define splits(a, b, C, H) as follows: 

51. If there is a route in H like that in Lemma 4.2 or 4.4 for i = a, j = b 
and Z = C, then we define splits(a, b, C, H) = 0. 

52. Otherwise, we define recursively splits(a, b, C, H) = splits(a, k, C, H p )+ 
splits(b, k, C, H p ) + 1, where p and k are selected as follows. Let \& de- 
note the set of routes between a and b in H that are superactive with 
respect to C. Let <& denote the dshortest routes in ^. Let T denote the 
shortest routes in <I>. Let p denote the route in T that comes first in 
9\. We call p the splitting route. Furthermore, let K denote the set of 
nodes in p but not in Cab that have minimal level in H p . Let k denote 
the node in K that comes first in 9T. Note that the only point with £H 
and VI is to select p and k unambiguously. 

Note that we have implicitly assumed in the definition S2 that K is non- 
empty. We now prove that this is always true. Assume to the contrary that K 
is empty. This means that all the nodes in p are in Cab. Since the definition 
SI did not apply, p must have some collider section v. Moreover, a = v± —> 
v <— Vi = b is a subroute of p: If v± ^ a,b (resp. v\ ^ a, b) then v± (resp. 
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vi) must be outside C for p to be superactive with respect to C, which 
contradicts the assumption that all the nodes in p are in Cab. Moreover, some 
node in v must be in C for p to be superactive with respect to C. However, 
this implies that a — > v <— b is a route that satisfies the requirements of the 
definition SI, which is a contradiction. 

Finally, we prove that splits(a, k, C, H p ) and splits(b, k, C, H p ) in the def- 
inition S2 are well-defined. Let g denote the subroute of p between the first 
occurrences of a and k in p when going from a to b. Note that if p contains k 
only in non-collider sections, then none of the other nodes in those sections 
can be in C for p to be superactive with respect to C and, thus, g is a route 
between a and k in H p that is superactive with respect to C and, thus, 
ajLn p k\C and, thus, splits(a, k, C, H p ) is defined. We now prove that p con- 
tains k only in non-collider sections. Assume the contrary and let v denote 
any collider section of p that contains k. Note that a = v\ — > v ±- v\ = b 
is a subroute of p, because if v\ ^ a,b or v\ ^ a,b then there exists a node 
in p but not in Cab with smaller level than k in H p , which is a contradic- 
tion. Moreover, some node in v must be in C for p to be superactive with 
respect to C . However, this implies that a — > v i— b is a route that satisfies 
the requirements of the definition SI, which is a contradiction. Now, let (p 
denote the subroute of p between the first occurrences of b and k in p when 
going from b to a. By repeating the reasoning above with ip instead of g, we 
can conclude that bJLn p k\C and, thus, that splits(b, k, C, H p ) is defined too. 
Moreover, note that g and ip have dlength equal or smaller than p and length 
strictly smaller than p. Therefore, the splitting routes for splits(a, k, C, H p ) 
and splits(b, k, C, H p ) are each either dshorter or shorter than p. This guar- 
antees that the recursive definition S2 eventually reaches the trivial case 
SI. 

Step 2 We prove the lemma by induction over the value of splits(i, j, Z, G). 
If splits (i, j, Z,G) = 0, then there exists a route in G like that in Lemma 
4.2 or 4.4. Therefore, there exists a probability distribution p G AA(G) such 
that ijL p j\Z by Lemma 4.2 or 4.4. 

Assume as induction hypothesis that the lemma holds for any value of 
splits(i,j, Z,G) smaller than m (m > 0). We now prove it for value m. 
Recall that splits(i,j, Z, G) = splits(i, k, Z, G p )+splits(j, k, Z, G p )+1 where 
p is a dshortest route among all the routes between i and j in G that are 
superactive with respect to Z, and k is a node in p but not in Zij that has 
minimal level in G p . Then, as shown in Step 1, i JL c p k\Z and j JL Q p k\Z. 
Moreover, splits(i, k, Z, G p ) and splits(j, k, Z, G p ) are both smaller than m. 
Then, by the induction hypothesis, there exist two probability distributions 
r, s € N{G p ) such that i ^L r k\Z and j JL s k\Z. We prove below that there 
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exists a probability distribution p £ Af(G p ) such that iJL p j\Z. Note that 
p £ M(G) because M{G p ) C A/"(G) by Lemma 4.1. 

By Lemma 4.3, there exists a real polynomial S(i, k, Z) in the nd pa- 
rameters in the parameterization of the probability distributions in J\f(G p ) 
such that, for every q £ J\f(G p ), i J_ q k\Z iff S(i,k,Z) vanishes for the nd 
parameter values coding q. Furthermore, S(i, k, Z) is non-trivial due to the 
probability distribution r above. Similarly, there exists a real polynomial 
S(j, k, Z) in the nd parameters in the parameterization of the probability 
distributions in M(G P ) such that, for every q £ M(G P ), j -L q k\Z iff S(j, k, Z) 
vanishes for the nd parameter values coding q. Furthermore, S(j, k, Z) is also 
non-trivial due to the probability distribution s above. Let sol(i,k, Z) and 
sol(j,k,Z) denote the sets of solutions to the polynomials S(i,k,Z) and 
S(j, k, Z), respectively. Let d denote the dimension of G p . Then, sol(i, k, Z) 
and sol(j, k, Z) have both zero Lebesgue measure with respect to M. d because 
they consist of the solutions to non-trivial real polynomials in real variables 
(the nd parameters) (Okamoto, 1973). Then, sol = sol(i, k, Z) U sol(j, k, Z) 
also has zero Lebesgue measure with respect to M. d , because the finite union 
of sets of zero Lebesgue measure has zero Lebesgue measure too. Conse- 
quently, the probability distributions q £ Af(G p ) such that i _L q k\Z or 
j _L q k\Z correspond to a set of elements of the nd parameter space for 
Af(G p ) that has zero Lebesgue measure with respect to M. d because it is 
contained in sol. Since this correspondence is one-to-one by Lemma 3.1, the 
probability distributions q £ M{G p ) such that i± q k\Z or j ± q k\Z also have 
zero Lebesgue measure with respect to M. d . This result together with Theo- 
rem 4.1 imply that there exists a probability distribution p £ Af(G p ) such 
that ijL p k\Z and j jL p k\Z. Note that these two independence statements to- 
gether with i-L p j\ Zk would imply the desired result by symmetry and weak 
transitivity. We prove below i _L q j\Zk which, in turn, implies i _L p j\Zk 
because p is Markovian with respect to G p , since p £ Af(G p ) (Lauritzen, 
1996, Proposition 3.30, Theorems 3.34 and 3.36). 

Assume to the contrary i JL c p j\Zk. Let g denote any route between 
i and j in G p that is superactive with respect to Zk. Note that g must 
contain k because, otherwise, g would be a route between i and j in G 
that is superactive with respect to Z and that is dshorter than p, which 
is a contradiction. Furthermore, g must contain k only in collider sections 
because, otherwise, g would not be superactive with respect to Zk. Let v 
denote any collider section of g that contains k. Note that i = V\ — ¥ v •<— 
vi = j is a subroute of g, because if v\ ^ i,j or vi ^ i,j then there exists a 
node in g but not in Zij with smaller level than k in G p . Since g is a route 
in G p , this implies that there exists a node in p but not in Zij with smaller 
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level than k in G p , which is a contradiction. Note also that no descendant 
of k in G can be in Z because, otherwise, i —¥ v <— j would be a route 
that satisfies the requirements of the definition SI, which is a contradiction. 
However, if no descendant of k in G is in Z, then p must contain k only 
in non-collider sections because, otherwise, p would not be superactive with 
respect to Z. The last two observations imply that % or j is a descendant 
of k in G which, together with i — > u <— j, implies that G has a directed 
pseudocycle between i and itself or between j and itself, because v contains 
k. This is a contradiction. □ 

Theorem 4.2. Let G be a CG of dimension d. The set of probability 
distributions in Af(G) that are not faithful to G has zero Lebesgue measure 
with respect to M. d . 

Proof. Note that the probability distributions in Af(G) are Markovian 
with respect to G (Lauritzen, 1996, Proposition 3.30, Theorems 3.34 and 
3.36). Then, for any probability distribution p G A/"(G) not to be faithful 
to G, p must satisfy some independence that is not entailed by G. That 
is, there must exist three disjoint subsets of V, here denoted as /, J and 
Z, such that I JL gJ\Z but / _L P J\Z. However, if / / gJ\Z then i JL gj\Z 
for some i £ / and j G J. Furthermore, if / _L P J\Z then i _L p j\Z by 
symmetry and decomposition. By Lemma 4.3, there exists a real polyno- 
mial S(i,j,Z) in the nd parameters in the parameterization of the prob- 
ability distributions in Af(G) such that, for every q G J\f(G), i _L q j\Z 
iff S(i,j,Z) vanishes for the nd parameter values coding q. Furthermore, 
S(i,j,Z) is non-trivial by Lemma 4.5. Let sol(i,j,Z) denote the set of so- 
lutions to the polynomial S(i,j,Z). Then, sol(i,j,Z) has zero Lebesgue 
measure with respect to M. d because it consists of the solutions to a non- 
trivial real polynomial in real variables (the nd parameters) (Okamoto, 
1973). Then, sol = {j {I}JjZ cvdi s ioint:i^J\z}U{ieijeJ:i^j\z} so Kh^ Z ) has 
zero Lebesgue measure with respect to W 1 , because the finite union of sets 
of zero Lebesgue measure has zero Lebesgue measure too. Consequently, the 
probability distributions in Af(G) that are not faithful to G correspond to a 
set of elements of the nd parameter space for Af(G) that has zero Lebesgue 
measure with respect to M. d because it is contained in sol. Since this corre- 
spondence is one-to-one by Lemma 3.1, the probability distributions in Af(G) 
that are not faithful to G also have zero Lebesgue measure with respect to 
R d . □ 

The following corollary, which follows trivially from Theorems 4.1 and 
4.2, summarizes the results in this section. 
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Corollary 4.1. Let G be a CG of dimension d. The set of probability 
distributions in Af(G) that are faithful to G has positive Lebesgue measure 
with respect to M. d . 

5. Equivalence in chain graphs. The space of CGs can be divided in 
classes of equivalent CGs according to criteria such as Markov independence 
equivalence, Markovian distribution equivalence or factorization equivalence. 
As we prove below with the help of the theorems above, these criteria ac- 
tually coincide. This result is important because the classes of Markovian 
distribution equivalent CGs have a simple graphical characterization and a 
natural representative, the so-called largest CG, which now also apply to the 
classes of equivalence induced by the other two criteria mentioned. We also 
prove below that all equivalent CGs have the same dimension with respect 
to the parameterization introduced in Section 4. 

Before proving our results, we formally define the equivalence criteria 
discussed in the paragraph above. Recall that, unless otherwise stated, all 
the probability distributions in this paper are defined on (state space) M. N , 
where \V\ = N. We say that two CGs are Markov independence equivalent 
if they represent the same independence model. We say that two CGs are 
Markovian distribution equivalent if every regular Gaussian distribution is 
Markovian with respect to both CGs or with respect to neither of them. We 
say that two CGs G and H are factorization equivalent if M{G) = N{H). 
The corollary below proves that these definitions coincide. 

Corollary 5.1. Let G and H denote two CGs. The following state- 
ments are equivalent in the frame of regular Gaussian distributions: 

1. G and H are factorization equivalent. 

2. G and H are Markovian distribution equivalent. 

3. G and H are Markov independence equivalent. 

Proof. The equivalence of Statements 1 and 2 follows from (Lauritzen, 
1996, Proposition 3.30, Theorems 3.34 and 3.36). We now prove that State- 
ments 2 and 3 are equivalent. By definition, Markov independence equiva- 
lence implies Markovian distribution equivalence. To see the opposite impli- 
cation, note that if G and H are not Markov independence equivalent, then 
one of them, say G, must represent a separation statement L }_qJ\K that is 
not represented by H . Consider a probability distribution p € N{H) faithful 
to H. Such a probability distribution exists due to Corollary 4.1, and it is 
Markovian with respect to H. However, p cannot be Markovian with respect 
to G, because I JLhJ\K implies I JL p J\K. □ 
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Frydenberg (1990, Theorem 5.6) gives a straightforward graphical char- 
acterization of Markovian distribution equivalence: Two CGs are Markovian 
distribution equivalent iff they have the same underlying UG and the same 
complexes. Due to the corollary above, that is also a graphical character- 
ization of the other two types of equivalence discussed there. Hereinafter, 
we do not distinguish anymore between the different types of equivalence 
discussed in the corollary above because they coincide and, thus, we simply 
refer to them as equivalence. 

Frydenberg (1990, Proposition 5.7) shows that every class of equivalent 
CGs contains a unique CG that has more undirected edges than any other 
CG in the class. Such a CG is called the largest CG (LCG) in the class, and 
it is usually considered a natural representative of the class. Studeny (1998, 
Section 4.2) conjectures that, for discrete probability distributions, the LCG 
in a class of equivalent CGs has fewer nd parameters than any other CG 
in the class. This would imply that the most space efficient way of storing 
the discrete probability distributions that factorize with respect to a class of 
equivalent CGs is by factorizing them with respect to the LCG in the class 
rather than with respect to any other CG in the class. The corollary below 
proves that an analogous conjecture for regular Gaussian distributions and 
the parameterization of them proposed in Section 4 would be false. 

Corollary 5.2. All equivalent CGs have the same dimension with re- 
spect to the parameterization proposed in Section 4- 

Proof. Let G denote the LCG in a class of equivalent CGs. Let H de- 
note any other CG in the class. Recall that the dimensions of G and H 
with respect to the parameterization proposed in Section 4 are, respectively, 
2\V\ + |G| and 2\V\ + \H\. Note that H can be obtained from G by orienting 
some of the undirected edges in G (Volf & Studeny, 1999, Theorem 3.9). 
Then, \H\ = \G\ and, thus, 2\V\ + \G\ = 2\V\ + \H\. □ 

6. Conclusions. In this paper, we have proven that, in certain measure- 
theoretic sense, almost all the regular Gaussian distributions that factorize 
with respect to a chain graph are faithful to it. This result extends previous 
results such as 

• (Spirtes et al., 1993, Theorem 3.2) where it is proven that, in certain 
measure-theoretic sense, almost all the regular Gaussian distributions 
that factorize with respect to an acyclic directed graph are faithful to 
it, and 

• (Lnenicka & Matus, 2007, Corollary 3) where it is proven that for any 
undirected graph there exists a regular Gaussian distribution that is 
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faithful to it. 

There are a number of consequences that follow from the result proven in 
this paper: 

• There are independence models that can be represented exactly by 
chain graphs but that cannot be represented exactly by undirected 
graphs or acyclic directed graphs. As a matter of fact, the experimental 
results in (Peha, 2007) suggest that this may be the case for the vast 
majority of independence models that can be represented exactly by 
chain graphs. This is an advantage of chain graphs when dealing with 
regular Gaussian distributions, because there exists a regular Gaussian 
distribution that is faithful to each of these independence models. 

• The moralization and c-separation criteria for reading independen- 
cies holding in the regular Gaussian distributions that factorize with 
respect to a chain graph are complete (i.e. they identify all the inde- 
pendencies that can be identified on the sole basis of the chain graph), 
because there exists a regular Gaussian distribution that is faithful to 
the chain graph. 

• Some definitions of equivalence in chain graphs coincide, which implies 
that the graphical characterization of Markovian distribution equiva- 
lence in (Prydenberg, 1990, Theorem 5.6) also applies to other defini- 
tions of equivalence. 

• For the parameterization introduced in this paper, all the chain graphs 
in a class of equivalence have the same dimension and, thus, their fac- 
torizations are equally space efficient for storing the regular Gaussian 
distribution that factorize with respect to the chain graphs in the class. 
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Appendix A. In this appendix, we derive Equations 2.1-2.6. Our deriva- 
tions are adaptations of those in (Bishop, 2006, Sections 2.3.1, 2.3.3) to the 
notation used in this paper. A Gaussian distribution for X can be written 

as 

-±{x-fi) T n(x-n) 

p(x) = Affjt, fr 1 ) = 

k 

where /i is a |U|-dimensional mean vector, O" 1 a |V| x |V|-dimensional 
covariance matrix, and k a normalization constant. Note that 

— (x — fi) T Q(x — fj,) = — [x T £lx — x t Q/j — [i T Qx + fi T Qfi] 
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= ——x T Qx + x T £lfj, + k! 

where k' is a constant, i.e. it is independent of x. In the last equality above 
we have used the fact that 

(6.1) x T nn = (x T nnf = fi T ( x T n) T = n T (n) T ( x T f = ^ T vix 

because O is symmetric. Then, a Gaussian distribution for X can be written 

as 

(6.2) p(x) = A%, IT 1 ) = 

where k" is a normalization constant. 

Let / and J denote two disjoint subsets of V. Let p(xjj) = f] _1 ) 
where f2 is positive definite. If we regard constant, then 

1 1 

--(x-/i) T fi(:r-^) = --[(x I -fj, I ) T n I j(x I -fj, I ) + (x I -fi I ) T Q I) j(xj-fj,j) 

+(xj - ^j) t Oj 5 /(x/ - m) + (xj - h,j) t VLjj(xj - hj)} 
= -- [xjtt^jxj - nJn ItJ xj + xjtljjxi - x T jVLjjiii + XjQjjxj - Xjtljjfij 

-(JjSIjjxj] + k'" = --Xj£Ij,jxj + x^in^jfij - fljj{xi - fii)] + k'" 

where k'" is a constant, i.e. it is independent of xj. In the last equality above 
we have used a reasoning analogous to that in Equation 6.1. Then, 



(6.3) p( XJ W ~ P(XU) ~ ^ 



-xKQ,j t jxj+x^^l Jt jiJ,j-Clj t(xi—ij,i) 



p(xi) k"" 

where k"" is a normalization constant, because xi can be regarded as a con- 
stant in p(xj\xj) since it is the value of the conditioning set. Consequently, 
p(xj\xi) is a Gaussian distribution since it can be written in the form given 
in Equation 6.2. By equating the term that is quadratic in X in Equation 
6.2 with the term that is quadratic in Xj in Equation 6.3, we conclude that 
the covariance matrix of p(xj\xj) is (fij j) . By equating the term that is 
linear in X in Equation 6.2 with the term that is linear in Xj in Equation 
6.3, we conclude that the mean vector of p(xj\xi) is 

- &j,i(xi - m)] = hj - (p, J>J )~ 1 n JtI (x I - m) 

= -(Ujj^QjjX! + flj + (n^j^QjjfJ,!. 
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Therefore, p(xj\xj) = N{5xi + 7,e _1 ) where 6, 7 and e are the following 
real matrices of dimensions, respectively, \ J\ x |/|, |J| x 1 and |J| x \J\: 

1 = ^.1 + {Qj,jT 1 SIj,iHi 

and 

e = Qj,J- 

Now, let p(xi) = W(a,/3 _1 ) and q(xj\xi) = N{5xi + 7,e _1 ) where 5, 
7 and e are real matrices of dimensions, respectively, |J| x |7|, |J| x 1 and 
I J| x |J|, and (3 and e are positive definite. Then, 

e ~^[(xj~Sx I -'y) T e(x.j-8x I -'y)+(x I -a) T fS(xj-a)] 

q(xj\xi)p(xi) = 



k 

where k is a normalization constant. Note that 

~\\^ X J ~ Sxi ~ 7) T e(^J - Sxi - 7) + (xi - a) T /3(x I - a)] 

= — -[xjexj — x^edxj — x^e-y — (5xj) T exj + (5xj) T e5xj + (5xj) T ej 



—7 ex j + 7 e<5x/ + Xj /3xj- — Xj /3a — a fixj] + k 
= —-[xjexj — Xje5xi — (5xi) exj + (5xj) e5xj + Xj/3xi] 

+x J e r y + Xj /3a — Xj 5 ej + k 

where k! is a constant, i.e. it is independent of xjj. In the last equality above 
we have used a reasoning analogous to that in Equation 6.1. By using this 
reasoning further and reorganizing some terms we can rewrite the expression 
above as 

— -[xj((3 + 5 T e5)xj + x'jexj — x T jebxj — xj5 T ex j] 
+xje7 + xj(/3a - 5 T ey) + k! 

Then, q(xj\xi)p(xj) can be expressed as 

J XI V f /3 + 5 T e5 -6 T e \ ( xj W xi V ( f3a-5 T e 1 

~ 2 \ X J \ ~ e6 e \ X J J\ X J ) V e ^ 

< 6 ' 4 » v 
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where k" is a normalization constant. Consequently, q(xj\xi)p(xj) is a Gaus- 
sian distribution over ( Xl ) since it can be expressed in the form given in 
\xj J 

Equation 6.2. As we did above, the precision matrix (resp. the mean vector) 
of q(xj\xi)p(xj) can easily be found by equating the term that is quadratic 
(resp. linear) in X in Equation 6.2 with the term that is quadratic (resp. 

linear) in ( ^ J in Equation 6.4. Specifically, q(xj\xj)p(xj) = AA(A,A _1 ) 



where 

and 
A = A" 



13 + Fed -5 T e 
-ed e 



(3a - 5 T e 7 \ ( 0" 1 (3~ l 5 T \ ( pa - 5 T e 1 



a 

5a + 7 

We have omitted the details of the derivation of A" 1 from A, but it can easily 
be checked that AA _1 equals the identity matrix. Note that A is invertible 
because /3 and e are invertible and, thus, that q{x j\x i)p{x i) is regular. 
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